-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maint: Restrict empty-matching regexes #1548
Conversation
* an empty-matching regex may not be used without a predicate (such as :pop!, :push, or a block) * an empty-matching regex with no lookahead/lookbehind or anchoring "closes" the state - it is invalid to add more rules, as they would never be reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what prompted the fix? Did you come across the infinite loop bug in your own work or did you just notice it was possible?
Otherwise looks good to me. Want me to merge it?
I don't quite remember what prompted this, but I think I had seen it come up somewhere. It's a really difficult bug to track down if you're not super familiar with the internals of Rouge. |
But yes, I think this is ready to merge. |
The maintenance changes in #1548 resulted in updates to two rules in the Ruby lexer, one regarding regular expression flags and the other regarding heredoc strings. However, these patterns should not have been changed. The change in #1548 was to prevent rules that matched an empty string unless they contained a 'predicate' of some sort. Both rules in the Ruby lexer contained such a predicate and so should not have been changed. This commit fixes that error.
The changes made in #1548 to avoid empty regular expression patterns, broke the template string rules in the JavaScript lexer (and lexers such as the TypeScript lexer that inherit from it). This commit adds the rules necessary to fix this lexing.
This commit makes several changes to the way ‘empty’ regular expressions work in Rouge. 1. The use of ‘empty’ regular expressions is restricted. An empty-matching regex may not be used without a predicate (such as `:pop!`, `:push`, or a block). 2. The use of an empty-matching regular expression with no lookahead/lookbehind or anchoring "closes" the state. It is invalid to add more rules, as they would never be reached. 3. Lexers that were using improper empty-matching regular expressions have been fixed.
The merging of rouge-ruby#1548 caused issues with two lexers that had been added after the PR was originally submitted. This commit fixes those errors. It also reduces the number of calls to `String#upcase` and `String#downcase` in the Apex lexer (this lexer was updated in rouge-ruby#1548).
The maintenance changes in rouge-ruby#1548 resulted in updates to two rules in the Ruby lexer, one regarding regular expression flags and the other regarding heredoc strings. However, these patterns should not have been changed. The change in rouge-ruby#1548 was to prevent rules that matched an empty string unless they contained a 'predicate' of some sort. Both rules in the Ruby lexer contained such a predicate and so should not have been changed. This commit fixes that error.
The changes made in rouge-ruby#1548 to avoid empty regular expression patterns, broke the template string rules in the JavaScript lexer (and lexers such as the TypeScript lexer that inherit from it). This commit adds the rules necessary to fix this lexing.
It is very common for a lexer developer to accidentally include a regex that matches the empty string, which can cause a number of issues. This patch restricts the use of empty-matching regexes in two ways:
A regex that matches the empty string must have a predicate - either pushing a new state,
:pop!
, or a custom block. Otherwise it will infinite-loop.Any empty-matching regex that does not contain lookahead/lookbehind causes the state to be "closed" - no more rules can be added. This is because these regexes must serve as "default" rules, and it's guaranteed that the processing won't continue past them.